38 research outputs found

    pubmed2ensembl: A Resource for Mining the Biological Literature on Genes

    Get PDF
    The last two decades have witnessed a dramatic acceleration in the production of genomic sequence information and publication of biomedical articles. Despite the fact that genome sequence data and publications are two of the most heavily relied-upon sources of information for many biologists, very little effort has been made to systematically integrate data from genomic sequences directly with the biological literature. For a limited number of model organisms dedicated teams manually curate publications about genes; however for species with no such dedicated staff many thousands of articles are never mapped to genes or genomic regions.To overcome the lack of integration between genomic data and biological literature, we have developed pubmed2ensembl (http://www.pubmed2ensembl.org), an extension to the BioMart system that links over 2,000,000 articles in PubMed to nearly 150,000 genes in Ensembl from 50 species. We use several sources of curated (e.g., Entrez Gene) and automatically generated (e.g., gene names extracted through text-mining on MEDLINE records) sources of gene-publication links, allowing users to filter and combine different data sources to suit their individual needs for information extraction and biological discovery. In addition to extending the Ensembl BioMart database to include published information on genes, we also implemented a scripting language for automated BioMart construction and a novel BioMart interface that allows text-based queries to be performed against PubMed and PubMed Central documents in conjunction with constraints on genomic features. Finally, we illustrate the potential of pubmed2ensembl through typical use cases that involve integrated queries across the biomedical literature and genomic data.By allowing biologists to find the relevant literature on specific genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers a much-needed genome informatics inspired solution to accessing the ever-increasing biomedical literature

    BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains

    Get PDF
    The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed

    Density functional theory based screening of ternary alkali-transition metal borohydrides: A computational material design project

    Get PDF
    The dissociation of molecules, even the most simple hydrogen molecule, cannot be described accurately within density functional theory because none of the currently available functionals accounts for strong on-site correlation. This problem led to a discussion of properties that the local Kohn-Sham potential has to satisfy in order to correctly describe strongly correlated systems. We derive an analytic expression for the nontrivial form of the Kohn-Sham potential in between the two fragments for the dissociation of a single bond. We show that the numerical calculations for a one-dimensional two-electron model system indeed approach and reach this limit. It is shown that the functional form of the potential is universal, i.e., independent of the details of the two fragments.We acknowledge funding by the Spanish MEC (Grant No. FIS2007-65702-C02-01), “Grupos Consolidados UPV/EHU del Gobierno Vasco” (Grant No. IT-319-07), and the European Community through e-I3 ETSF project (Grant Agreement No. 211956).Peer reviewe

    Density functional theory based screening of ternary alkali-transition metal borohydrides: A computational material design project

    Get PDF

    Genome-wide associations for birth weight and correlations with adult disease

    Get PDF
    Birth weight (BW) has been shown to be influenced by both fetal and maternal factors and in observational studies is reproducibly associated with future risk of adult metabolic diseases including type 2 diabetes (T2D) and cardiovascular disease. These life-course associations have often been attributed to the impact of an adverse early life environment. Here, we performed a multi-ancestry genome-wide association study (GWAS) meta-analysis of BW in 153,781 individuals, identifying 60 loci where fetal genotype was associated with BW (P\textit{P}  < 5 × 108^{-8}). Overall, approximately 15% of variance in BW was captured by assays of fetal genetic variation. Using genetic association alone, we found strong inverse genetic correlations between BW and systolic blood pressure (R\textit{R}g_{g} = -0.22, P\textit{P}  = 5.5 × 1013^{-13}), T2D (R\textit{R}g_{g} = -0.27, P\textit{P}  = 1.1 × 106^{-6}) and coronary artery disease (R\textit{R}g_{g} = -0.30, P\textit{P}  = 6.5 × 109^{-9}). In addition, using large -cohort datasets, we demonstrated that genetic factors were the major contributor to the negative covariance between BW and future cardiometabolic risk. Pathway analyses indicated that the protein products of genes within BW-associated regions were enriched for diverse processes including insulin signalling, glucose homeostasis, glycogen biosynthesis and chromatin remodelling. There was also enrichment of associations with BW in known imprinted regions (P\textit{P} = 1.9 × 104^{-4}). We demonstrate that life-course associations between early growth phenotypes and adult cardiometabolic disease are in part the result of shared genetic effects and identify some of the pathways through which these causal genetic effects are mediated.For a full list of the funders pelase visit the publisher's website and look at the supplemetary material provided. Some of the funders are: British Heart Foundation, Cancer Research UK, Medical Research Council, National Institutes of Health, Royal Society and Wellcome Trust

    The health care and life sciences community profile for dataset descriptions

    Get PDF
    Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets
    corecore